Skip to content

DOC: spell out low_memory split-value hazard in read_csv/read_table#65636

Merged
mroeschke merged 1 commit into
pandas-dev:mainfrom
jbrockmendel:bug-22194
May 16, 2026
Merged

DOC: spell out low_memory split-value hazard in read_csv/read_table#65636
mroeschke merged 1 commit into
pandas-dev:mainfrom
jbrockmendel:bug-22194

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

closes #22194

Summary

  • Expand the low_memory parameter docstring in :func:read_csv and :func:read_table to describe the practical consequence of per-chunk type inference: the same literal can land in the resulting object-dtype column as both an int and a str, so equality / drop_duplicates / isin only match a subset of the rows holding that value.
  • Point users at :class:~pandas.errors.DtypeWarning, whose own docstring already has a worked example.

Notes

  • Docstring-only — no behavior change.
  • low_memory=True in read_csv leads to non documented, silent errors #22194 has been open since 2018 asking for this to be called out more clearly; the DtypeWarning class docstring covers it, but the read_csv parameter blurb (the surface most users actually read) only said "possibly mixed type inference," which is too abstract to land.

Test plan

  • pre-commit hooks pass

🤖 Generated with Claude Code

Expand the low_memory parameter blurb to describe the practical
consequence of per-chunk type inference: the same literal can land in
the resulting object-dtype column as both an int and a str, so equality
checks match only a subset of the rows. Points users at DtypeWarning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mroeschke mroeschke added this to the 3.1 milestone May 16, 2026
@mroeschke mroeschke added the Docs label May 16, 2026
@mroeschke mroeschke merged commit dacaec7 into pandas-dev:main May 16, 2026
51 checks passed
@mroeschke
Copy link
Copy Markdown
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the bug-22194 branch May 17, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

low_memory=True in read_csv leads to non documented, silent errors

2 participants